If anything remains unclear or unanswered after this session, also if you spot mistakes or typos, contact:
Twitter: @liza_p_semenova
Github: elizavetasemenova
E-mail: elizaveta.p.semenova@gmail.com
Julia:
Bayesian Inference:
Outro
We will use Julia for the practical part:
println("All models are wrong, but some are useful.")
All models are wrong, but some are useful.
Follow instructions here: https://github.com/elizavetasemenova/EmbracingUncertainty
Also Python and R versions of the Bayesian block are available in this repository.
##############################################
# basic Julia syntaxix
##############################################
println("Hello, world!")
Hello, world!
3 - 2.4
0.6000000000000001
println("computed ", 3 - 4.5)
computed -1.5
println("Hello" * " Julia!")
Hello Julia!
" who" ^ 3 * ", who let the dogs out" # do you know this song?
" who who who, who let the dogs out"
println((50 / 60) ^ 2)
0.6944444444444445
x = 9
y = 7
x % y
2
rem(9, 7)
2
x ÷ y
1
x \ y
0.7777777777777778
Beware:
false && true || true
true
Julia allows one to omit the multiplication operator when writing expressions.
x = 3
4x^2+x
39
4+1<2||2+2<1
false
a=true
b=!!!!a
true
type in '\theta'
This allows to copy math equatios directly, as if we would be writing them by hand.
θ = 3
3
2θ
6
typeof(1)
Int64
typeof("Hello, world!")
String
typeof("αβγδ")
String
typeof("H")
String
typeof('H')
Char
typeof(1+3+5.)
Float64
Array{Int64}(undef, 3)
3-element Array{Int64,1}:
4565960688
4369189104
4565959232
Array{String}(undef, 3)
3-element Array{String,1}:
#undef
#undef
#undef
'a'==="a"
false
x=3
y=2
a = Array{Integer,2}(undef, x, y)
3×2 Array{Integer,2}:
#undef #undef
#undef #undef
#undef #undef
x = Array{Int64}(undef,11, 12)
typeof(x)
Array{Int64,2}
a, b, c = cos(0.2), log(10), abs(-1.22) # multiple assignment
(0.9800665778412416, 2.302585092994046, 1.22)
myfunc(x) = 20*x
myfunc (generic function with 1 method)
myfunc(2)
40
add(x,y) = x + y
add (generic function with 1 method)
add(33, -22.2)
10.8
function nextfunc(a, b, c)
a*b + c
end
nextfunc (generic function with 1 method)
nextfunc(7,5,3)
38
function print_type(x)
println("The type of testvar is $(typeof(x)) and the value of testvar is $x")
end
print_type (generic function with 1 method)
a = ['1',2.]
print_type(a)
The type of testvar is Array{Any,1} and the value of testvar is Any['1', 2.0]
function f(x)
return 2x
3x
end
f(5)
10
cos_func(x) = cos(x)
cos_func (generic function with 1 method)
cos_func(.7)
0.7648421872844885
cos_func(adj, hyp) = adj/hyp
cos_func (generic function with 2 methods)
cos_func(.7)
0.7648421872844885
cos_func(12, 13)
0.9230769230769231
methods(cos_func)
?cos_func
search: cos_func
No documentation found.
cos_func is a Function.
# 2 methods for generic function "cos_func":
[1] cos_func(x) in Main at In[40]:1
[2] cos_func(adj, hyp) in Main at In[42]:1
cos_func(theta::Float64) = cos(theta) # :: forces Julia to check the type
cos_func (generic function with 3 methods)
Use '?' to searach. For example, '?cos':
?cos
search: cos cosh cosd cosc cospi cos_func acos acosh acosd sincos const close
cos(x)
Compute cosine of x, where x is in radians.
cos(A::AbstractMatrix)
Compute the matrix cosine of a square matrix A.
If A is symmetric or Hermitian, its eigendecomposition (eigen) is used to compute the cosine. Otherwise, the cosine is determined by calling exp.
jldoctest
julia> cos(fill(1.0, (2,2)))
2×2 Array{Float64,2}:
0.291927 -0.708073
-0.708073 0.291927
x = [1, 6, 2, 4, 7, 2, 76, 5]
8-element Array{Int64,1}:
1
6
2
4
7
2
76
5
size(x)
(8,)
length(x)
8
typeof(x)
Array{Int64,1}
x = collect(1:7)
7-element Array{Int64,1}:
1
2
3
4
5
6
7
x = range(0, stop = 1, length = 10)
0.0:0.1111111111111111:1.0
x = collect(x)
10-element Array{Float64,1}:
0.0
0.1111111111111111
0.2222222222222222
0.3333333333333333
0.4444444444444444
0.5555555555555556
0.6666666666666666
0.7777777777777778
0.8888888888888888
1.0
x = fill(5, 4)
4-element Array{Int64,1}:
5
5
5
5
x + 2
MethodError: no method matching +(::Array{Int64,1}, ::Int64)
Closest candidates are:
+(::Any, ::Any, !Matched::Any, !Matched::Any...) at operators.jl:502
+(!Matched::Complex{Bool}, ::Real) at complex.jl:292
+(!Matched::Missing, ::Number) at missing.jl:93
...
Stacktrace:
[1] top-level scope at In[57]:1
x .+ 2
4-element Array{Int64,1}:
7
7
7
7
x .* x
4-element Array{Int64,1}:
25
25
25
25
data = [1.6800483 -1.641695388;
0.501309281 -0.977697538;
1.528012113 0.52771122;
1.70012253 1.711524991;
1.992493625 1.891000015]
5×2 Array{Float64,2}:
1.68005 -1.6417
0.501309 -0.977698
1.52801 0.527711
1.70012 1.71152
1.99249 1.891
size(data)
(5, 2)
typeof(data)
Array{Float64,2}
data[:,1] # all rows, 1st column
5-element Array{Float64,1}:
1.6800483
0.501309281
1.528012113
1.70012253
1.992493625
data = [[3,2,1] [3,2,1] [3,2,1] [3,2,1] [3,2,1] [3,2,1] [3,2,1] [3,2,1] [6,5,4]]
3×9 Array{Int64,2}:
3 3 3 3 3 3 3 3 6
2 2 2 2 2 2 2 2 5
1 1 1 1 1 1 1 1 4
data[2,:]
9-element Array{Int64,1}:
2
2
2
2
2
2
2
2
5
rows, cols = size(data)
println(rows)
println(cols)
3 9
for num = 1:2:length(x)
println("num is now $num")
end
num is now 1 num is now 3
values = [ "first element", 'θ' , 42] # an array
for x in values
println("The value of x is now $x")
end
The value of x is now first element The value of x is now θ The value of x is now 42
x = [3, 2, 1]
count=1
for i in x
x[i]=count
count=count+1
end
println(x[3])
println(x)
1 [3, 2, 1]
if 5>3
println("test passed")
end
test passed
using Plots
gr()
Plots.GRBackend()
x = cos.( - 10:0.1:10)
plot(1:length(x), x)
plot!(title = "First plot", size = [300, 300], legend = false) # modify existing plot
hline!([0])
plot(1:length(x), x,
size = [300, 300],
legend=false,
line= :scatter,
marker = :hex)
x = collect(1:0.1:7)
f(x) = 2 - 2x + x^2/4 + sin(2x)
plot(x, f)
#savefig("Plot_name.png") # Saved png format
using Distributions
d = Normal()
Normal{Float64}(μ=0.0, σ=1.0)
n_draws = 1000
draws = rand(d, n_draws)
histogram(draws, size = [300, 300], bins = 40, leg = false, norm = true)
[pdf(d, x) for x in -3:1:3] # evaluate PDF/PMF
7-element Array{Float64,1}:
0.0044318484119380075
0.05399096651318806
0.24197072451914337
0.3989422804014327
0.24197072451914337
0.05399096651318806
0.0044318484119380075
shape = (3, 2)
rand(Uniform(0,1), shape)
3×2 Array{Float64,2}:
0.10616 0.115299
0.982672 0.0514775
0.867987 0.673957
Please run these commands now:
using Distributions
using Plots
using StatsBase
using Turing
Bayesian inference is the process of deducing properties of a probability distribution from data using Bayes’ theorem.
Bayes' theorem relies heavily on the notion of conditional probability.
Conditional probability of A given B:
$$P(A \vert B)=\frac{P(A \cap B)}{P(B)}.$$
$P(A \vert B)$ - probability of event A given B has occured
$P(A \cap B)$ - probability of event A occured and B occured
We can derive
The conditional probability of A given B is the conditional probability of B given A scaled by the relative probability of A compared to B.
Find probability of a patient having a flue given they have high temperature.
A = to have a flu
B = to have high fever
$$P(\text{flu} \vert \text{fever})= P(\text{fever} \vert \text{flu}) \frac{P(\text{flu})}{P(\text{fever})}$$We know that the probability of having fever this time of the year is 10%, probability of having a flue this time of the year is 7%, and among all people having a flu, 70% of them have fever.
$$ 70\% * 7\% / 10\% = 49\%.$$
We would like to test a message for being a spam, by checking that it
contains a predefined set of words or word combinations, which are
common for spam messages ("Free investment", "Dear email"):
$$P(\text{spam } \vert \text{ words}) = \frac{ P(\text{words } \vert \text{ spam}) P(\text{spam})}{P(\text{words})}$$
In practice, we usually want to estimate the unknown model parameters $$\theta = (\theta_1, ..., \theta_k),$$ given the observed data $$y = (y_1, ..., y_n)$$ by using the Bayes' rule: $$P(\theta \vert y) = \frac{P(y \vert \theta) P(\theta)}{P(y)}.$$
In continuous case, we speak of probability densities $f(.)$.
The posterior distribution is being formed as
$$f(\theta \vert y) \propto f(y \vert \theta) f(\theta).$$
$f(\theta)$ is the prioir and expresses our beliefs about $\theta$
$f(y \vert \theta)$ is the likelihood of data $y$ as a function of $\theta$
we dropped $f(y)$ because the average likelihood across all parameter values $$f(y) = \int_{\theta_1} ... \int_{\theta_k} f(\theta) f(y \vert \theta) d\theta_1 ... d\theta_k = E_{\theta}f(y \vert \theta)$$ does not depend on $\theta$ (hence will not influence the inference) and is very hard to compute.
Likelihood
Prior
Posterior
Frequentist doctor:
Bayesian doctor:
I.e. frequentist inference is like a doctor who never reads patient's history even for patients with chronic conditions.
Frequentist:
Bayesian:
Remark: historically, Bayesian approach appeared earlier than the frequentist one.
Why have we been using frequentist methods all these years?
Because we didn't have computers! Bayesian inference is very hard to make by hand.
Prior knowledge
Flexible in uncertainty modeling, particularly under small amount of available data
Very flexible model formulation accounting for the mechanistic knowledge about a system
Note:
analytically
elegant, but rarely possible.
numerically
Create posterior samples, describing the distributions of parameters.
We achieve this by exploring the space of parameters to find the most probable combinations of parameters.
Further we treat the obtained sampled as new data, to extract information about parameters, such as mean, credible interval or other statistics.
informative / non-informative
conjugate priors guarantee that the posterior has an easily calculable form
Some conjugate families of distributions
(Robert, Casella, "Monate Carlo Statistical Methods")
using Distributions
using Plots
using StatsBase
##############################################
# prioir x likelihood = posterior
##############################################
success=6
tosses=9
# Create a distribution with n = 9 (e.g. tosses) and p = 0.5.
d = Binomial(tosses, 0.5)
pdf(d, success)
0.1640625000000001
# define grid
grid_points = 100
p_grid = range(0, stop = 1, length = grid_points)
# compute likelihood at each point in the grid
likelihood = [pdf(Binomial(tosses, p), success) for p in p_grid]
# define prior
prior = ones(length(p_grid));
# As Uniform prior has been used, unstandardized posterior is equal to likelihood
# compute product of likelihood and prior
posterior = likelihood .* prior;
function computePosterior(likelihood, prior)
# compute product of likelihood and prior
unstd_posterior = likelihood .* prior
# standardize posterior
posterior = unstd_posterior / sum(unstd_posterior)
p1 = plot(p_grid, prior, title = "Prior")
p2 = plot(p_grid, likelihood , title = "Likelihood")
p3 = plot(p_grid, posterior, title = "Posterior")
plot(p1, p2, p3, layout=(1, 3), label="")
end
computePosterior (generic function with 1 method)
prior1 = ones(length(p_grid))
posterior1 = computePosterior(likelihood, prior1)
Posterior is the same as the prior.
#prior2 = 2 * (p_grid .>= 0.5)
prior2 = 0.5 * (p_grid .>= 0.5)
posterior2 = computePosterior(likelihood, prior2)
Posterior vanishes at the points where prior vanishes. I.e. we need to be careful and avoid assignign zero prior values to the regions where have no evidnece of those values being impossible.
prior3 = exp.(-5 * abs.(p_grid .- 0.5))
posterior3 = computePosterior(likelihood, prior3)
Some priors, such as exponential and Laplace, allow to obtain the "shrinkage" effect.
is a group of algorithms which use random sampling repeatedly to make numerical estimations of unknown qunatities/parameters.
(originated within the Manhattan Project thanks to Stanislaw Ulam)
Find approximate value of $\pi.$
(Note: we are solving a deterministic problem using random number generation)
##############################################
# the Monte Carlo method - compute pi
##############################################
function in_circle(x, y, r)
sqrt(x^2 + y^2) <= r
end
in_circle (generic function with 1 method)
function approx_pi(r, n)
xs, ys, cols = [], [], []
count = 0
for i in range(1, step=1, stop=n)
x = rand(Uniform(0,1))
y = rand(Uniform(0,1))
append!(xs, x)
append!(ys, y)
if in_circle(x, y, r)
count += 1
cols = vcat(cols, :red)
else
cols = vcat(cols, :steelblue)
end
end
pi_appr = round(4 * count/n, digits = 3)
pl = scatter(xs,
ys,
color=cols,
size=(200,200),
legend = false,
xticks = false,
yticks = false,
framestyle = :box,
title = "pi (approximately) = " * string(pi_appr),
titlefontsize=font(7, "Calibri"))
display(pl)
end
approx_pi (generic function with 1 method)
r = 1
n = 100
for n in 5 * 10 .^[1, 2, 3]
approx_pi(r, n)
end
(Again, let's solve a deterministic problem using random sampling)
Find value of the integral
$$\int_a^b f(x)dx. $$Monte Carlo integration estimates this integral by estimaing the fraction of random points that fall below $f(x)$.
In our context, we are interested in estimating expectations
$$ E[h(x)] = \int h(x)f(x)dx,$$which can be done with
$$ \bar{h}_n = \frac{1}{n} \sum_i^n h(x_i),$$where $x_i ∼ f$ is a draw from the density $f$.
The convergence of Monte Carlo integration is $0(n^{1/2})$ and independent of the dimensionality. Hence Monte Carlo integration generally beats numerical intergration for moderate- and high-dimensional integration since numerical integration (quadrature) converges as $0(n^d)$.
Estiamte the integral $\int_0^1 e^x dx$.
##############################################
# the Monte Carlo method - integration
##############################################
exp(1) - exp(0)
1.718281828459045
x = range(0, stop = 1, length = 100)
plot(x, exp.(x), size= [200,200], legend= false)
pts = rand(Uniform(0,1), (100, 2)) # sample uniformly in the square
pts[:, 2] *= exp(1)
cols = fill(:steelblue, 100)
for i in range(1, step=1, stop=100)
if pts[i,2] > exp(pts[i,1]) # acceptance / rejection step
cols[i] = :red
end
end
scatter!(pts[:, 1], pts[:, 2], color = cols, size=[250, 250], legend = false, xlim = [0,1], ylim = [0, exp(1)])
# Monte Carlo approximation
for n in 10 .^[1, 2, 3, 4, 5, 6, 7, 8]
pts = rand(Uniform(0,1), (n, 2))
pts[:, 2] *= exp(1)
count = sum(pts[:, 2] .< exp.(pts[:, 1]))
volume = exp(1) * 1 # volume of region
sol = (volume * count)/n
println(sol)
end
1.3591409142295225 1.4678721873678844 1.7152358337576574 1.7549227484531595 1.7204277520500142 1.7188538668713365 1.7179524846170195 1.7184223939967052
Is the coin biased if we observed (H, H, T, H, ... , T, H)?
How to derive the MLE estimate?
and so
$$ \hat{\theta} = \frac{\text{heads}}{\text{heads + tails}} $$##############################################
# coin tossing
##############################################
n = 4
h = 3
p = h/n
0.75
Beta distribution:
$$ \text{Beta}_\theta(a,b) = C * \theta^{(a-1)} (1 - \theta)^{(b-1)} $$If we compute the posterior distribution analytically, we will obtain a Beta distribution with new parameters.
a, b = 10, 10 # hyperparameters
prior = Beta(a, b) # prior
post = Beta(h+a, n-h+b) # posterior
Beta{Float64}(α=13.0, β=11.0)
function beta_binomial(n, h, a, b)
# frequentist
p = h/n
mu = mean(Binomial(n, p))
# Bayesian
thetas = range(0, stop=1, length=200)
prior = pdf.(Beta(a, b), thetas)
post = pdf.(Beta(h+a, n-h+b), thetas)
likelihood = n * [pdf(Binomial(n, p), h) for p in thetas];
plot(thetas,
prior,
size= [400, 400],
label = "Prior",
color = :blue,
xlim = [0, 1],
xlabel = "theta",
ylabel = "Density")
plot!(thetas, post, label = "Posterior", color = :red)
plot!(thetas, likelihood, label="Likelihood", color = :green, legend = :topleft)
vline!([(h+a-1)/(n+a+b-2)], color = :red, linestyle = :dash, label="MAP")
vline!([mu / n], color = :green, linestyle = :dash, label="MLE")
end
beta_binomial (generic function with 1 method)
beta_binomial(100, 80, 10, 10)
beta_binomial(4, 3, 10, 10)
beta_binomial(4, 3, 2, 2)
beta_binomial(4, 3, 1, 1)
Markov Chain Monte Carlo idea:
under certain conditions, the Markov chain will have a unique stationary distribution
we set up an acceptance criteria for each draw based on comparing successive states with respect to a target distribution that enusre that the stationary distribution is the posterior distribution we are searching for
there is no need to evaluate the potentially intractable marginal likelihood
after sufficient number of iterations, the Markov chain of accepted draws will converge to the staionary distribution, and we can use those samples as (correlated) draws from the posterior distribution, and find functions of the posterior distribution
Start with an initial guess for $\theta$
Chose a new proposed value as $\theta_p = \theta + \delta_\theta, \delta_\theta \sim N(0, \sigma).$
Here we have chosen the proposal distribution to be $N(0, \sigma).$
If $g$ is the posterior probability, calculate the ratio $\rho = \frac{g(\theta_p \mid X)}{g(\theta \mid X)}$
(adjust for symmetry of the proposal distribution)
##############################################
# Metropolis-Hastings
##############################################
function target(likelihood, prior, n, h, theta)
if (theta < 0 || theta > 1)
return 0
else
return (pdf(likelihood(n, theta), h) * pdf(prior, theta))
end
end
target (generic function with 1 method)
n = 100
h = 61
a = 10
b = 10
likelihood = Binomial
prior = Beta(a, b)
sigma = 0.3
0.3
naccept = 0
theta = 0.1
niters = 10000
samples = zeros(niters+1)
samples[1] = theta
for i=1:niters
theta_p = theta + rand(Normal(0, sigma))
rho = min(1, target(likelihood, prior, n, h, theta_p)/target(likelihood, prior, n, h, theta ))
u = rand(Uniform(0,1))
if u < rho
naccept += 1
theta = theta_p
end
samples[i+1] = theta
end
println("Portion of accepted steps = " * string(naccept/niters))
Portion of accepted steps = 0.2029
nmcmc = Int(round(length(samples)/2))
post = Beta(h+a, n-h+b)
thetas = range(0, stop=1, length=200)
histogram(samples[nmcmc:length(samples)] ,
size = [500, 300],
label="Distribution of posterior samples", alpha = 0.5,
legend = :topleft)
histogram!(rand(prior, nmcmc),
label = "Distribution of prior samples", alpha = 0.5)
plot!(thetas, 50*[pdf(post, theta) for theta in thetas], color = :red, label = "True posterior")
We run the chain for $N$ iterations and discard the first $B$ samples. This is called burn-in.
We can run several parallel versions of the algorithm. Each of them is called a chain.
Neigbouring samples will contain similar information. We might want to save only every second, or fifth, or tenth. This is called thinnning.
Rigorous way of assesing convergence is an unsolved problems. But there are several tool swe can use to convice ourselves that an MCMC has converged, such as
function mh_coin(niters, n, h, theta, likelihood, prior, sigma)
samples = [theta]
while length(samples) < niters
theta_p = theta + rand(Normal(0, sigma))
rho = min(1, target(likelihood, prior, n, h, theta_p)/target(likelihood, prior, n, h, theta ))
u = rand(Uniform(0,1))
if u < rho
theta = theta_p
end
append!(samples, theta)
end
return samples
end
mh_coin (generic function with 1 method)
n = 100
h = 61
lik = Binomial
prior = Beta(a, b)
sigma = 0.05
niters = 100
100
chains = [mh_coin(niters, n, h, theta, lik, prior, sigma) for theta in range(0.1, stop=1, step=0.2)];
Compare multiple chains
p = plot(chains[1], size= [500, 500], legend =:false, xlim = [0, niters], ylim = [0, 1])
for i in 2:length(chains)
plot!(chains[i])
end
display(p)
Was this painful to write a sampler by hand?
If not, bare in mind that we only wrote the simplest one possible! Sampling algorithms can get very complicated. for chain in chains pl = plot(chain) display(pl) end
Luckily, we do not need to write a sampler by hand every time, because probabilistic programming languages (PPLs) are there to help.
A PPL allows to formalize a Bayesian model and perform inference with the help of powerful algorithms. A user needs to only formulate the model (and maybe chose a sampler) and press the inference button.
The list of currently existing PPLs is overwhelmingly long and only keeps growing:
to name a few.size(chains)
There is already a couple PPLs accessible via or native to Julia.
We will work with Turing.
##############################################
# Turing
##############################################
using Turing
Model creation
@model mod(y) = begin
# model definition
end
┌ Warning: Model definition seems empty, still continue. └ @ Turing.Core /Users/kcft114/.julia/packages/Turing/m05p3/src/core/compiler.jl:494
mod (generic function with 2 methods)
n = 100 # number of trials
h = 61 # number of successes
niter = 10000
10000
@model coin(n, h) = begin
# prior
p ~ Beta(2, 2)
# likelihood
h ~ Binomial(n, p)
end
coin (generic function with 3 methods)
Sampling
ch = sample(coin(n,h), NUTS(niter, 0.65));
┌ Info: Found initial step size
│ init_ϵ = 0.4
└ @ Turing.Inference /Users/kcft114/.julia/packages/Turing/m05p3/src/inference/hmc.jl:365
┌ Info: Finished 1000 adapation steps
│ adaptor = StanHMCAdaptor(n_adapts=1000, pc=DiagPreconditioner, ssa=NesterovDualAveraging(γ=0.05, t_0=10.0, κ=0.75, δ=0.65, state.ϵ=1.997853682146408), init_buffer=75, term_buffer=50)
│ τ.integrator = Leapfrog(ϵ=2.0)
│ h.metric = DiagEuclideanMetric([0.0385448])
└ @ AdvancedHMC /Users/kcft114/.julia/packages/AdvancedHMC/YWXfk/src/sampler.jl:67
┌ Info: Finished 10000 sampling steps in 0.585773358 (s)
│ h = Hamiltonian(metric=DiagEuclideanMetric([0.0385448]))
│ τ = NUTS{Multinomial}(integrator=Leapfrog(ϵ=2.0), max_depth=5), Δ_max=1000.0)
│ EBFMI(Hs) = 13679.723818325752
│ mean(αs) = 0.539336737579122
└ @ AdvancedHMC /Users/kcft114/.julia/packages/AdvancedHMC/YWXfk/src/sampler.jl:77
show(ch)
Object of type Chains, with data of type 9000×11×1 Array{Union{Missing, Float64},3}
Log evidence = 0.0
Iterations = 1:9000
Thinning interval = 1
Chains = 1
Samples per chain = 9000
internals = eval_num, lp, acceptance_rate, hamiltonian_energy, is_accept, log_density, n_steps, numerical_error, step_size, tree_depth
parameters = p
2-element Array{ChainDataFrame,1}
Summary Statistics
│ Row │ parameters │ mean │ std │ naive_se │ mcse │ ess │ r_hat │
│ │ Symbol │ Float64 │ Float64 │ Float64 │ Float64 │ Any │ Any │
├─────┼────────────┼─────────┼───────────┼─────────────┼────────────┼─────────┼─────────┤
│ 1 │ p │ 0.60588 │ 0.0464697 │ 0.000489834 │ 0.00058591 │ 5552.61 │ 1.00008 │
Quantiles
│ Row │ parameters │ 2.5% │ 25.0% │ 50.0% │ 75.0% │ 97.5% │
│ │ Symbol │ Float64 │ Float64 │ Float64 │ Float64 │ Float64 │
├─────┼────────────┼──────────┼──────────┼──────────┼──────────┼──────────┤
│ 1 │ p │ 0.512391 │ 0.575263 │ 0.606619 │ 0.636669 │ 0.697461 │
# read samples into array
p = convert(Array{Float64}, ch[:p].value.data[:,:,1][:,1]);
histogram(p, size = [300, 300], legend = false, title = "posterior density")
# traceplot
plot(p, size = [300, 300], legend = false, title = "traceplot")
function plot_par(par)
p1 = histogram(par, size = [400, 300], legend = false, title = "posterior density")
p2 = plot(par, title = "traceplot")
plot(p1, p2, layout=(1, 2), label="")
end
plot_par (generic function with 1 method)
plot_par(p)
Hierarchical models put priors on the hyperparamaters.
$$ y \sim f(y \mid \theta) $$$$ \theta \sim h(\theta \mid \sigma) $$More levels of hiearchy are possible.
In this way a hierarchical model allows information to be shared between parameters $\theta,$ since they might be not independent, but rather drawn from a common distribution.
##############################################
# hierarchical models
##############################################
@model coin_hier(n, h) = begin
# hyperparameters
alpha_hyp ~ InverseGamma(10, 2)
beta_hyp ~ InverseGamma(10, 2)
# prior
p ~ Beta(alpha_hyp, beta_hyp)
# likelihood
h ~ Binomial(n, p)
end
coin_hier (generic function with 3 methods)
niter = 20000
20000
ch = sample(coin_hier(n,h), NUTS(niter, 0.30));
┌ Info: Found initial step size
│ init_ϵ = 0.8
└ @ Turing.Inference /Users/kcft114/.julia/packages/Turing/m05p3/src/inference/hmc.jl:365
┌ Warning: The current proposal will be rejected due to numerical error(s).
│ isfiniteθ = true
│ isfiniter = true
│ isfiniteℓπ = true
│ isfiniteℓκ = false
└ @ AdvancedHMC /Users/kcft114/.julia/packages/AdvancedHMC/YWXfk/src/hamiltonian.jl:36
┌ Info: Finished 1000 adapation steps
│ adaptor = StanHMCAdaptor(n_adapts=1000, pc=DiagPreconditioner, ssa=NesterovDualAveraging(γ=0.05, t_0=10.0, κ=0.75, δ=0.3, state.ϵ=2.6513610946627013), init_buffer=75, term_buffer=50)
│ τ.integrator = Leapfrog(ϵ=2.65)
│ h.metric = DiagEuclideanMetric([0.0906861, 0.0890539, 0.04 ...])
└ @ AdvancedHMC /Users/kcft114/.julia/packages/AdvancedHMC/YWXfk/src/sampler.jl:67
┌ Info: Finished 20000 sampling steps in 0.658708255 (s)
│ h = Hamiltonian(metric=DiagEuclideanMetric([0.0906861, 0.0890539, 0.04 ...]))
│ τ = NUTS{Multinomial}(integrator=Leapfrog(ϵ=2.65), max_depth=5), Δ_max=1000.0)
│ EBFMI(Hs) = 18482.23049922853
│ mean(αs) = 0.05218346942083084
└ @ AdvancedHMC /Users/kcft114/.julia/packages/AdvancedHMC/YWXfk/src/sampler.jl:77
show(ch)
Object of type Chains, with data of type 19000×13×1 Array{Union{Missing, Float64},3}
Log evidence = 0.0
Iterations = 1:19000
Thinning interval = 1
Chains = 1
Samples per chain = 19000
internals = eval_num, lp, acceptance_rate, hamiltonian_energy, is_accept, log_density, n_steps, numerical_error, step_size, tree_depth
parameters = alpha_hyp, beta_hyp, p
2-element Array{ChainDataFrame,1}
Summary Statistics
│ Row │ parameters │ mean │ std │ naive_se │ mcse │ ess │ r_hat │
│ │ Symbol │ Float64 │ Float64 │ Float64 │ Float64 │ Any │ Any │
├─────┼────────────┼──────────┼───────────┼─────────────┼────────────┼─────────┼──────────┤
│ 1 │ alpha_hyp │ 0.236544 │ 0.0841867 │ 0.000610755 │ 0.00310413 │ 678.542 │ 0.999955 │
│ 2 │ beta_hyp │ 0.225032 │ 0.0783482 │ 0.000568398 │ 0.00312763 │ 487.225 │ 1.00011 │
│ 3 │ p │ 0.604955 │ 0.0530292 │ 0.000384714 │ 0.00247522 │ 262.195 │ 1.00004 │
Quantiles
│ Row │ parameters │ 2.5% │ 25.0% │ 50.0% │ 75.0% │ 97.5% │
│ │ Symbol │ Float64 │ Float64 │ Float64 │ Float64 │ Float64 │
├─────┼────────────┼──────────┼──────────┼──────────┼──────────┼──────────┤
│ 1 │ alpha_hyp │ 0.11868 │ 0.173982 │ 0.223655 │ 0.276504 │ 0.425165 │
│ 2 │ beta_hyp │ 0.111732 │ 0.166139 │ 0.21462 │ 0.271694 │ 0.397669 │
│ 3 │ p │ 0.48967 │ 0.574321 │ 0.606406 │ 0.645988 │ 0.696406 │
# read samples into array
p = convert(Array{Float64}, ch[:p].value.data[:,:,1][:,1]);
plot_par(p)
alpha_hyp = convert(Array{Float64}, ch[:alpha_hyp].value.data[:,:,1][:,1]);
plot_par(alpha_hyp)
beta_hyp = convert(Array{Float64}, ch[:beta_hyp].value.data[:,:,1][:,1]);
plot_par(beta_hyp)
##############################################
# normal distribution
##############################################
N = 2000
y = rand(Normal(0,1), N)
histogram(y, size = [300, 300], legend = false)
@model norm_mu(y) = begin
sigma = 1
# prior
mu ~ Normal(0,0.5)
# likelihood
for i in eachindex(y)
y[i] ~ Normal(mu, sigma)
end
end
norm_mu (generic function with 2 methods)
ch = sample(norm_mu(y), NUTS(niter, 0.65));
┌ Info: Found initial step size
│ init_ϵ = 0.025
└ @ Turing.Inference /Users/kcft114/.julia/packages/Turing/m05p3/src/inference/hmc.jl:365
┌ Info: Finished 1000 adapation steps
│ adaptor = StanHMCAdaptor(n_adapts=1000, pc=DiagPreconditioner, ssa=NesterovDualAveraging(γ=0.05, t_0=10.0, κ=0.75, δ=0.65, state.ϵ=1.4493213954879545), init_buffer=75, term_buffer=50)
│ τ.integrator = Leapfrog(ϵ=1.45)
│ h.metric = DiagEuclideanMetric([0.000417026])
└ @ AdvancedHMC /Users/kcft114/.julia/packages/AdvancedHMC/YWXfk/src/sampler.jl:67
┌ Info: Finished 10000 sampling steps in 1.670022066 (s)
│ h = Hamiltonian(metric=DiagEuclideanMetric([0.000417026]))
│ τ = NUTS{Multinomial}(integrator=Leapfrog(ϵ=1.45), max_depth=5), Δ_max=1000.0)
│ EBFMI(Hs) = 6483.333347059405
│ mean(αs) = 0.8365008389526049
└ @ AdvancedHMC /Users/kcft114/.julia/packages/AdvancedHMC/YWXfk/src/sampler.jl:77
mu = ch[:mu].value.data[:,:,1]
plot_par(mu)
num_chains = 4
chains = mapreduce(c -> sample(norm_mu(y), NUTS(niter, 0.65)), chainscat, 1:num_chains)
┌ Info: Found initial step size
│ init_ϵ = 0.025
└ @ Turing.Inference /Users/kcft114/.julia/packages/Turing/m05p3/src/inference/hmc.jl:365
┌ Info: Finished 1000 adapation steps
│ adaptor = StanHMCAdaptor(n_adapts=1000, pc=DiagPreconditioner, ssa=NesterovDualAveraging(γ=0.05, t_0=10.0, κ=0.75, δ=0.65, state.ϵ=1.3751185160680217), init_buffer=75, term_buffer=50)
│ τ.integrator = Leapfrog(ϵ=1.38)
│ h.metric = DiagEuclideanMetric([0.00041332])
└ @ AdvancedHMC /Users/kcft114/.julia/packages/AdvancedHMC/YWXfk/src/sampler.jl:67
┌ Info: Finished 10000 sampling steps in 1.754149703 (s)
│ h = Hamiltonian(metric=DiagEuclideanMetric([0.00041332]))
│ τ = NUTS{Multinomial}(integrator=Leapfrog(ϵ=1.38), max_depth=5), Δ_max=1000.0)
│ EBFMI(Hs) = 12339.956720005377
│ mean(αs) = 0.8554619551130905
└ @ AdvancedHMC /Users/kcft114/.julia/packages/AdvancedHMC/YWXfk/src/sampler.jl:77
┌ Info: Found initial step size
│ init_ϵ = 0.025
└ @ Turing.Inference /Users/kcft114/.julia/packages/Turing/m05p3/src/inference/hmc.jl:365
┌ Info: Finished 1000 adapation steps
│ adaptor = StanHMCAdaptor(n_adapts=1000, pc=DiagPreconditioner, ssa=NesterovDualAveraging(γ=0.05, t_0=10.0, κ=0.75, δ=0.65, state.ϵ=1.441049484955168), init_buffer=75, term_buffer=50)
│ τ.integrator = Leapfrog(ϵ=1.44)
│ h.metric = DiagEuclideanMetric([0.000498292])
└ @ AdvancedHMC /Users/kcft114/.julia/packages/AdvancedHMC/YWXfk/src/sampler.jl:67
┌ Info: Finished 10000 sampling steps in 1.455153829 (s)
│ h = Hamiltonian(metric=DiagEuclideanMetric([0.000498292]))
│ τ = NUTS{Multinomial}(integrator=Leapfrog(ϵ=1.44), max_depth=5), Δ_max=1000.0)
│ EBFMI(Hs) = 12973.808820742
│ mean(αs) = 0.8070678849330645
└ @ AdvancedHMC /Users/kcft114/.julia/packages/AdvancedHMC/YWXfk/src/sampler.jl:77
┌ Info: Found initial step size
│ init_ϵ = 0.025
└ @ Turing.Inference /Users/kcft114/.julia/packages/Turing/m05p3/src/inference/hmc.jl:365
┌ Info: Finished 1000 adapation steps
│ adaptor = StanHMCAdaptor(n_adapts=1000, pc=DiagPreconditioner, ssa=NesterovDualAveraging(γ=0.05, t_0=10.0, κ=0.75, δ=0.65, state.ϵ=1.5273130022586943), init_buffer=75, term_buffer=50)
│ τ.integrator = Leapfrog(ϵ=1.53)
│ h.metric = DiagEuclideanMetric([0.00043099])
└ @ AdvancedHMC /Users/kcft114/.julia/packages/AdvancedHMC/YWXfk/src/sampler.jl:67
┌ Info: Finished 10000 sampling steps in 1.610108754 (s)
│ h = Hamiltonian(metric=DiagEuclideanMetric([0.00043099]))
│ τ = NUTS{Multinomial}(integrator=Leapfrog(ϵ=1.53), max_depth=5), Δ_max=1000.0)
│ EBFMI(Hs) = 6555.9308989341225
│ mean(αs) = 0.814720671247542
└ @ AdvancedHMC /Users/kcft114/.julia/packages/AdvancedHMC/YWXfk/src/sampler.jl:77
┌ Info: Found initial step size
│ init_ϵ = 0.025
└ @ Turing.Inference /Users/kcft114/.julia/packages/Turing/m05p3/src/inference/hmc.jl:365
┌ Info: Finished 1000 adapation steps
│ adaptor = StanHMCAdaptor(n_adapts=1000, pc=DiagPreconditioner, ssa=NesterovDualAveraging(γ=0.05, t_0=10.0, κ=0.75, δ=0.65, state.ϵ=1.40355160599994), init_buffer=75, term_buffer=50)
│ τ.integrator = Leapfrog(ϵ=1.4)
│ h.metric = DiagEuclideanMetric([0.000514812])
└ @ AdvancedHMC /Users/kcft114/.julia/packages/AdvancedHMC/YWXfk/src/sampler.jl:67
┌ Info: Finished 10000 sampling steps in 1.498614675 (s)
│ h = Hamiltonian(metric=DiagEuclideanMetric([0.000514812]))
│ τ = NUTS{Multinomial}(integrator=Leapfrog(ϵ=1.4), max_depth=5), Δ_max=1000.0)
│ EBFMI(Hs) = 7932.130209781657
│ mean(αs) = 0.8061749750402668
└ @ AdvancedHMC /Users/kcft114/.julia/packages/AdvancedHMC/YWXfk/src/sampler.jl:77
Object of type Chains, with data of type 9000×11×4 Array{Union{Missing, Float64},3}
Iterations = 1:9000
Thinning interval = 1
Chains = 1, 2, 3, 4
Samples per chain = 9000
internals = eval_num, lp, acceptance_rate, hamiltonian_energy, is_accept, log_density, n_steps, numerical_error, step_size, tree_depth
parameters = mu
2-element Array{ChainDataFrame,1}
Summary Statistics
. Omitted printing of 2 columns
│ Row │ parameters │ mean │ std │ naive_se │ mcse │
│ │ Symbol │ Float64 │ Float64 │ Float64 │ Float64 │
├─────┼────────────┼────────────┼───────────┼─────────────┼─────────────┤
│ 1 │ mu │ -0.0048235 │ 0.0224061 │ 0.000118091 │ 0.000164522 │
Quantiles
. Omitted printing of 1 columns
│ Row │ parameters │ 2.5% │ 25.0% │ 50.0% │ 75.0% │
│ │ Symbol │ Float64 │ Float64 │ Float64 │ Float64 │
├─────┼────────────┼───────────┼───────────┼─────────────┼───────────┤
│ 1 │ mu │ -0.048737 │ -0.020058 │ -0.00475555 │ 0.0103441 │
mu = chains[:mu].value.data
plot(mu[:,:,1], label ="chain 1")
plot!(mu[:,:,2], label ="chain 2")
plot!(mu[:,:,3], label ="chain 3")
plot!(mu[:,:,4], label ="chain 4")
histogram(mu[:,:,1], alpha = 0.5, label = "chain 1")
histogram!(mu[:,:,2], alpha = 0.5, label = "chain 2")
histogram!(mu[:,:,3], alpha = 0.5, label = "chain 3")
histogram!(mu[:,:,4], alpha = 0.5, label = "chain 4")
@model norm_mu_sigma(y) = begin
# priors
mu ~ Normal(0,0.5)
sigma ~ InverseGamma(2, 3)
# likelihood
for i in eachindex(y)
y[i] ~ Normal(mu, sigma)
end
end
norm_mu_sigma (generic function with 2 methods)
ch = sample(norm_mu_sigma(y), NUTS(niter, 0.65));
┌ Info: Found initial step size
│ init_ϵ = 0.0125
└ @ Turing.Inference /Users/kcft114/.julia/packages/Turing/m05p3/src/inference/hmc.jl:365
┌ Info: Finished 1000 adapation steps
│ adaptor = StanHMCAdaptor(n_adapts=1000, pc=DiagPreconditioner, ssa=NesterovDualAveraging(γ=0.05, t_0=10.0, κ=0.75, δ=0.65, state.ϵ=1.2747417524413236), init_buffer=75, term_buffer=50)
│ τ.integrator = Leapfrog(ϵ=1.27)
│ h.metric = DiagEuclideanMetric([0.000470837, 0.000264778])
└ @ AdvancedHMC /Users/kcft114/.julia/packages/AdvancedHMC/YWXfk/src/sampler.jl:67
┌ Info: Finished 10000 sampling steps in 2.972704221 (s)
│ h = Hamiltonian(metric=DiagEuclideanMetric([0.000470837, 0.000264778]))
│ τ = NUTS{Multinomial}(integrator=Leapfrog(ϵ=1.27), max_depth=5), Δ_max=1000.0)
│ EBFMI(Hs) = 882.3506661582911
│ mean(αs) = 0.7930821554299076
└ @ AdvancedHMC /Users/kcft114/.julia/packages/AdvancedHMC/YWXfk/src/sampler.jl:77
show(ch)
Object of type Chains, with data of type 9000×12×1 Array{Union{Missing, Float64},3}
Log evidence = 0.0
Iterations = 1:9000
Thinning interval = 1
Chains = 1
Samples per chain = 9000
internals = eval_num, lp, acceptance_rate, hamiltonian_energy, is_accept, log_density, n_steps, numerical_error, step_size, tree_depth
parameters = sigma, mu
2-element Array{ChainDataFrame,1}
Summary Statistics
│ Row │ parameters │ mean │ std │ naive_se │ mcse │ ess │ r_hat │
│ │ Symbol │ Float64 │ Float64 │ Float64 │ Float64 │ Any │ Any │
├─────┼────────────┼─────────────┼───────────┼─────────────┼─────────────┼─────────┼─────────┤
│ 1 │ mu │ -0.00464717 │ 0.0227181 │ 0.00023947 │ 0.000252435 │ 8374.93 │ 0.99989 │
│ 2 │ sigma │ 0.989695 │ 0.015715 │ 0.000165651 │ 0.000135044 │ 9000.0 │ 0.99989 │
Quantiles
│ Row │ parameters │ 2.5% │ 25.0% │ 50.0% │ 75.0% │ 97.5% │
│ │ Symbol │ Float64 │ Float64 │ Float64 │ Float64 │ Float64 │
├─────┼────────────┼───────────┼────────────┼─────────────┼───────────┼───────────┤
│ 1 │ mu │ -0.048276 │ -0.0201671 │ -0.00480952 │ 0.0105537 │ 0.0399445 │
│ 2 │ sigma │ 0.959298 │ 0.979116 │ 0.989499 │ 1.00024 │ 1.0211 │
mu = ch[:mu].value.data[:,:,1]
sigma = ch[:sigma].value.data[:,:,1]
pl_mu = plot_par(mu)
vline!([0], color = :green, label="MLE")
pl_sigma = plot_par(sigma)
vline!([1], color = :green, label="MLE")
display(pl_mu)
display(pl_sigma)
Likelihood:
$$ y \sim N(ax+b, \sigma^2)$$Priors:
$$ a \sim N(0,100) $$$$ b \sim N(0,100) $$$$ \sigma \sim U(0,20) $$##############################################
# linear regression
##############################################
n = 100
a_true = 6
b_true = 2
x = range(0, stop=1, length = n)
x = convert(Array, x)
y = a_true*x .+ b_true + rand(Normal(0,1), n);
plot(x, a_true*x .+ b_true, legend = false, size = [350, 350], color = :blue)
scatter!(x, y)
@model lin_reg(x, y) = begin
a ~ Normal(0, 10)
b ~ Normal(0, 10)
lp = a * x .+ b
s ~ InverseGamma(2, 3)
for i in eachindex(y)
y[i] ~ Normal(lp[i], sqrt(s))
end
end
lin_reg (generic function with 3 methods)
niter = 20000
20000
ch = sample(lin_reg(x, y), NUTS(niter, 0.65));
┌ Info: Found initial step size
│ init_ϵ = 0.2
└ @ Turing.Inference /Users/kcft114/.julia/packages/Turing/m05p3/src/inference/hmc.jl:365
┌ Info: Finished 1000 adapation steps
│ adaptor = StanHMCAdaptor(n_adapts=1000, pc=DiagPreconditioner, ssa=NesterovDualAveraging(γ=0.05, t_0=10.0, κ=0.75, δ=0.65, state.ϵ=0.4527591848079716), init_buffer=75, term_buffer=50)
│ τ.integrator = Leapfrog(ϵ=0.453)
│ h.metric = DiagEuclideanMetric([0.118204, 0.0407925, 0.018 ...])
└ @ AdvancedHMC /Users/kcft114/.julia/packages/AdvancedHMC/YWXfk/src/sampler.jl:67
┌ Info: Finished 20000 sampling steps in 3.594597784 (s)
│ h = Hamiltonian(metric=DiagEuclideanMetric([0.118204, 0.0407925, 0.018 ...]))
│ τ = NUTS{Multinomial}(integrator=Leapfrog(ϵ=0.453), max_depth=5), Δ_max=1000.0)
│ EBFMI(Hs) = 5710.433307092268
│ mean(αs) = 0.8579825157539616
└ @ AdvancedHMC /Users/kcft114/.julia/packages/AdvancedHMC/YWXfk/src/sampler.jl:77
a = ch[:a].value.data[:,:,1]
b = ch[:b].value.data[:,:,1]
s = ch[:s].value.data[:,:,1]
pl_a = plot_par(a)
vline!([a_true])
pl_b = plot_par(b)
vline!([b_true])
pl_s = plot_par(s)
vline!([1])
display(pl_a);
display(pl_b)
display(pl_s)
##############################################
# binomial likelihood
##############################################
function invlogit(x)
exp.(x) ./ (1 .+ exp.(x))
end
invlogit (generic function with 1 method)
n = 1000
x = rand(Normal(), n)
alpha_true = -0.3
beta_true = 0.7
ps = alpha_true .+ beta_true*x
ps = invlogit(ps)
y = [rand(Binomial(10, p)) for p in ps];
histogram(ps, bins = 20, title = "p", label = "", size = [300, 300])
@model binom(x, y) = begin
alpha ~ Normal(0, 1)
beta ~ Normal(0, 1)
p = invlogit(alpha .+ beta*x)
for i in eachindex(y)
y[i] ~ Binomial(10, p[i])
end
end
binom (generic function with 3 methods)
ch = sample(binom(x, y), NUTS(niter, 0.65));
┌ Info: Found initial step size
│ init_ϵ = 0.025
└ @ Turing.Inference /Users/kcft114/.julia/packages/Turing/m05p3/src/inference/hmc.jl:365
┌ Info: Finished 1000 adapation steps
│ adaptor = StanHMCAdaptor(n_adapts=1000, pc=DiagPreconditioner, ssa=NesterovDualAveraging(γ=0.05, t_0=10.0, κ=0.75, δ=0.65, state.ϵ=1.079233753616307), init_buffer=75, term_buffer=50)
│ τ.integrator = Leapfrog(ϵ=1.08)
│ h.metric = DiagEuclideanMetric([0.0004598, 0.000638437])
└ @ AdvancedHMC /Users/kcft114/.julia/packages/AdvancedHMC/YWXfk/src/sampler.jl:67
┌ Info: Finished 20000 sampling steps in 21.369849906 (s)
│ h = Hamiltonian(metric=DiagEuclideanMetric([0.0004598, 0.000638437]))
│ τ = NUTS{Multinomial}(integrator=Leapfrog(ϵ=1.08), max_depth=5), Δ_max=1000.0)
│ EBFMI(Hs) = 15333.80231513245
│ mean(αs) = 0.8513041124541438
└ @ AdvancedHMC /Users/kcft114/.julia/packages/AdvancedHMC/YWXfk/src/sampler.jl:77
show(ch)
Object of type Chains, with data of type 19000×12×1 Array{Union{Missing, Float64},3}
Log evidence = 0.0
Iterations = 1:19000
Thinning interval = 1
Chains = 1
Samples per chain = 19000
internals = eval_num, lp, acceptance_rate, hamiltonian_energy, is_accept, log_density, n_steps, numerical_error, step_size, tree_depth
parameters = alpha, beta
2-element Array{ChainDataFrame,1}
Summary Statistics
│ Row │ parameters │ mean │ std │ naive_se │ mcse │ ess │ r_hat │
│ │ Symbol │ Float64 │ Float64 │ Float64 │ Float64 │ Any │ Any │
├─────┼────────────┼───────────┼───────────┼─────────────┼─────────────┼─────────┼─────────┤
│ 1 │ alpha │ -0.287012 │ 0.0211922 │ 0.000153744 │ 0.000141211 │ 19000.0 │ 1.00007 │
│ 2 │ beta │ 0.710331 │ 0.0239084 │ 0.00017345 │ 0.000159387 │ 18697.2 │ 1.00008 │
Quantiles
│ Row │ parameters │ 2.5% │ 25.0% │ 50.0% │ 75.0% │ 97.5% │
│ │ Symbol │ Float64 │ Float64 │ Float64 │ Float64 │ Float64 │
├─────┼────────────┼──────────┼───────────┼───────────┼───────────┼───────────┤
│ 1 │ alpha │ -0.32898 │ -0.301143 │ -0.287089 │ -0.272646 │ -0.245106 │
│ 2 │ beta │ 0.663797 │ 0.694189 │ 0.710112 │ 0.726482 │ 0.758001 │
alpha = ch[:alpha].value.data[:,:,1]
beta = ch[:beta].value.data[:,:,1]
pl_a = plot_par(alpha)
pl_b = plot_par(beta)
display(pl_a);
display(pl_b);
(Graphics by Eric Ma)
(Graphics by Eric Ma)
"Gaussian Process Behaviour in Wide Deep Neural Networks"
Alexander Matthews
R. Neal (1994):
A single layered network (MLP) with $D$ input features and $K$ hidden units, where prior is appropriately scaled, i.e.
As $K$ tends to infinity, the hidden layer converges in distribution to a multivariate normal $N(0,C)$ with covariance matrix $C.$
(Proof: Mutlivariate version of the Central Limit Theorem)
"In distribution is the key!
(Alexander Matthews, NeurIPS 2019)
Matthews et al (2018),
generalisation of the result for multiple hidden layers:
As $K_1, K_2, ... , K_i$ all tend to infinity, the output layer $i$ is distributed as $N(0, C^{(i)}),$
where $C^{(i)}$ is recursively defined from $C^{(i-1)}.$
(Alexander Matthews, NeurIPS 2019)
Given what we currently know about the world, how should we decide what to do, taking into account future events and observations that may change our conclusions?
Decision theory offers a way to automatically design and run experiments and to optimally construct clinical trials.
As we gather more information, we learn more about the world. However, the things we learn about depends on what actions we take.
For example, if we always take the same route to work, then we learn how much time this route takes on different days and times of the week. However, we don’t obtain information about the time other routes take. So, we potentially miss out on better choices than the one we follow usually. This phenomenon gives rise to the so-called exploration-exploitation trade-off.
The exploration-exploitation trade-off arises when data collection is interactive.
$\theta$ - parameter
$p(\theta)$ - prior on $\theta$
$A$ - all possible actions one can take
$l: A x \Theta \to R$ - loss function
The posterior risk of an action $a \in A$ is $$r(a) = E_\theta [ l(a, \theta) \mid D] = $$
i.e. the expectation of the loss
$$ = \int l(a, \theta) p(\theta \mid D) d \theta $$$l (\theta, \hat{\theta}) = \mid \theta - \hat{\theta} \mid$ -> median of the posterior
$l (\theta, \hat{\theta}) = I_{\theta \ne \hat{\theta}}$ -> mode
Probability can be used to describe how likely an event is; utility can be used to describe how desirable it is.
an example of Julia + Turing being used for a publication
learn Julia:
Bayesian Infernce:
using Pkg
Pkg.status()
Status `~/.julia/environments/v1.0/Project.toml` [0bf59076] AdvancedHMC v0.2.2 [c52e3926] Atom v0.8.8 [76274a88] Bijectors v0.3.1 [336ed68f] CSV v0.5.9 [159f3aea] Cairo v0.6.0 [324d7699] CategoricalArrays v0.5.5 [3a865a2d] CuArrays v1.1.0 [a93c6f00] DataFrames v0.19.1 [31c24e10] Distributions v0.21.1 [ced4e74d] DistributionsAD v0.1.0 [587475ba] Flux v0.8.3 [da1fdf0e] FreqTables v0.3.1 [38e38edf] GLM v1.3.1 [28b8d3ca] GR v0.41.0 [c91e804a] Gadfly v1.0.1 [93e5fe13] Hyperopt v0.2.4 [7073ff75] IJulia v1.19.0 [18b7da76] JuliaAcademyData v0.1.0 #master (https://github.com/JuliaComputing/JuliaAcademyData.jl) [e5e0dc1b] Juno v0.7.0 [5ab0869b] KernelDensity v0.5.1 [b964fa9f] LaTeXStrings v1.0.3 [c7f686f2] MCMCChains v0.3.11 [f0e99cf1] MLBase v0.8.0 [cc2ba9b6] MLDataUtils v0.5.0 [270893ea] MLPreprocessing v0.0.0 #master (https://github.com/JuliaML/MLPreprocessing.jl) [ee78f7c6] Makie v0.9.4 [442fdcdd] Measures v0.3.0 [47be7bcc] ORCA v0.2.1 [429524aa] Optim v0.19.2 [3b7a836e] PGFPlots v3.1.3 [58dd65bb] Plotly v0.2.0 [f0f68f2c] PlotlyJS v0.12.5 [91a5bcdd] Plots v0.25.3 [92933f4c] ProgressMeter v1.0.0 [438e738f] PyCall v1.91.2 [d330b81b] PyPlot v2.8.1 [ce6b1742] RDatasets v0.6.3 [682df890] Stan v3.5.0 [2913bbd2] StatsBase v0.32.0 [4c63d2b9] StatsFuns v0.8.0 [3eaba693] StatsModels v0.6.2 [f3b207a7] StatsPlots v0.11.0 [9f7883ad] Tracker v0.2.2 [fce5fe82] Turing v0.6.23 #master (https://github.com/TuringLang/Turing.jl.git) [e88e6eb3] Zygote v0.4.1